Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

NUM8
BOOL1

Warnings

Pregnancies has 111 (14.5%) zeros Zeros

Reproduction

Analysis started2021-12-04 11:52:26.982487
Analysis finished2021-12-04 11:52:39.802395
Duration12.82 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Pregnancies
Real number (ℝ≥0)

ZEROS

Distinct17
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.845052083
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Memory size6.0 KiB
2021-12-04T17:22:39.935224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2
Skewness0.9016739792
Sum2953
Variance11.35405632
MonotocityNot monotonic
2021-12-04T17:22:40.062724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%) 
113517.6%
 
011114.5%
 
210313.4%
 
3759.8%
 
4688.9%
 
5577.4%
 
6506.5%
 
7455.9%
 
8384.9%
 
9283.6%
 
Other values (7)587.6%
 
ValueCountFrequency (%) 
011114.5%
 
113517.6%
 
210313.4%
 
3759.8%
 
4688.9%
 
ValueCountFrequency (%) 
1710.1%
 
1510.1%
 
1420.3%
 
13101.3%
 
1291.2%
 

Glucose
Real number (ℝ≥0)

Distinct136
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121.681605
Minimum44
Maximum199
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:40.223809image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile80
Q199.75
median117
Q3140.25
95-th percentile181
Maximum199
Range155
Interquartile range (IQR)40.5

Descriptive statistics

Standard deviation30.43601564
Coefficient of variation (CV)0.2501283217
Kurtosis-0.2588195957
Mean121.681605
Median Absolute Deviation (MAD)20
Skewness0.5332247527
Sum93451.47266
Variance926.3510483
MonotocityNot monotonic
2021-12-04T17:22:40.383344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
99172.2%
 
100172.2%
 
111141.8%
 
106141.8%
 
125141.8%
 
129141.8%
 
105131.7%
 
112131.7%
 
95131.7%
 
102131.7%
 
Other values (126)62681.5%
 
ValueCountFrequency (%) 
4410.1%
 
5610.1%
 
5720.3%
 
6110.1%
 
6210.1%
 
ValueCountFrequency (%) 
19910.1%
 
19810.1%
 
19740.5%
 
19630.4%
 
19520.3%
 

BloodPressure
Real number (ℝ≥0)

Distinct47
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.25480652
Minimum24
Maximum122
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:40.564369image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile52
Q164
median72
Q380
95-th percentile90
Maximum122
Range98
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.1159316
Coefficient of variation (CV)0.1676833997
Kurtosis1.079233365
Mean72.25480652
Median Absolute Deviation (MAD)8
Skewness0.1730502132
Sum55491.69141
Variance146.7957985
MonotocityNot monotonic
2021-12-04T17:22:40.738272image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%) 
70577.4%
 
74526.8%
 
78455.9%
 
68455.9%
 
72445.7%
 
64435.6%
 
80405.2%
 
76395.1%
 
60374.8%
 
69.10546875354.6%
 
Other values (37)33143.1%
 
ValueCountFrequency (%) 
2410.1%
 
3020.3%
 
3810.1%
 
4010.1%
 
4440.5%
 
ValueCountFrequency (%) 
12210.1%
 
11410.1%
 
11030.4%
 
10820.3%
 
10630.4%
 

SkinThickness
Real number (ℝ≥0)

Distinct51
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.60647922
Minimum7
Maximum99
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:40.917793image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile14.35
Q120.53645833
median23
Q332
95-th percentile44
Maximum99
Range92
Interquartile range (IQR)11.46354167

Descriptive statistics

Standard deviation9.631240711
Coefficient of variation (CV)0.3619885454
Kurtosis3.904657462
Mean26.60647922
Median Absolute Deviation (MAD)5
Skewness1.226669961
Sum20433.77604
Variance92.76079763
MonotocityNot monotonic
2021-12-04T17:22:41.205455image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20.5364583322729.6%
 
32314.0%
 
30273.5%
 
27233.0%
 
23222.9%
 
33202.6%
 
18202.6%
 
28202.6%
 
31192.5%
 
19182.3%
 
Other values (41)34144.4%
 
ValueCountFrequency (%) 
720.3%
 
820.3%
 
1050.7%
 
1160.8%
 
1270.9%
 
ValueCountFrequency (%) 
9910.1%
 
6310.1%
 
6010.1%
 
5610.1%
 
5420.3%
 

Insulin
Real number (ℝ≥0)

Distinct186
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean118.660163
Minimum14
Maximum846
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:41.365416image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile50
Q179.79947917
median79.79947917
Q3127.25
95-th percentile293
Maximum846
Range832
Interquartile range (IQR)47.45052083

Descriptive statistics

Standard deviation93.08035765
Coefficient of variation (CV)0.7844280277
Kurtosis14.14170435
Mean118.660163
Median Absolute Deviation (MAD)3
Skewness3.291825025
Sum91131.00521
Variance8663.952981
MonotocityNot monotonic
2021-12-04T17:22:41.529805image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
79.7994791737448.7%
 
105111.4%
 
13091.2%
 
14091.2%
 
12081.0%
 
18070.9%
 
10070.9%
 
9470.9%
 
11060.8%
 
13560.8%
 
Other values (176)32442.2%
 
ValueCountFrequency (%) 
1410.1%
 
1510.1%
 
1610.1%
 
1820.3%
 
2210.1%
 
ValueCountFrequency (%) 
84610.1%
 
74410.1%
 
68010.1%
 
60010.1%
 
57910.1%
 

BMI
Real number (ℝ≥0)

Distinct248
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.45080516
Minimum18.2
Maximum67.1
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:41.700122image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.235
Q127.5
median32
Q336.6
95-th percentile44.395
Maximum67.1
Range48.9
Interquartile range (IQR)9.1

Descriptive statistics

Standard deviation6.875373507
Coefficient of variation (CV)0.21187066
Kurtosis0.9213175059
Mean32.45080516
Median Absolute Deviation (MAD)4.5
Skewness0.6011031773
Sum24922.21836
Variance47.27076087
MonotocityNot monotonic
2021-12-04T17:22:41.859058image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
32131.7%
 
31.6121.6%
 
31.2121.6%
 
31.99257812111.4%
 
32.4101.3%
 
33.3101.3%
 
30.891.2%
 
32.991.2%
 
32.891.2%
 
30.191.2%
 
Other values (238)66486.5%
 
ValueCountFrequency (%) 
18.230.4%
 
18.410.1%
 
19.110.1%
 
19.310.1%
 
19.410.1%
 
ValueCountFrequency (%) 
67.110.1%
 
59.410.1%
 
57.310.1%
 
5510.1%
 
53.210.1%
 

DiabetesPedigreeFunction
Real number (ℝ≥0)

Distinct517
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763021
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:42.015600image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.1675
Skewness1.919911066
Sum362.401
Variance0.1097786379
MonotocityNot monotonic
2021-12-04T17:22:42.190646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.25860.8%
 
0.25460.8%
 
0.26850.7%
 
0.26150.7%
 
0.20750.7%
 
0.23850.7%
 
0.25950.7%
 
0.55140.5%
 
0.69240.5%
 
0.28440.5%
 
Other values (507)71993.6%
 
ValueCountFrequency (%) 
0.07810.1%
 
0.08410.1%
 
0.08520.3%
 
0.08820.3%
 
0.08910.1%
 
ValueCountFrequency (%) 
2.4210.1%
 
2.32910.1%
 
2.28810.1%
 
2.13710.1%
 
1.89310.1%
 

Age
Real number (ℝ≥0)

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.24088542
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2021-12-04T17:22:42.353594image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)7
Skewness1.129596701
Sum25529
Variance138.3030459
MonotocityNot monotonic
2021-12-04T17:22:42.505044image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
22729.4%
 
21638.2%
 
25486.2%
 
24466.0%
 
23384.9%
 
28354.6%
 
26334.3%
 
27324.2%
 
29293.8%
 
31243.1%
 
Other values (42)34845.3%
 
ValueCountFrequency (%) 
21638.2%
 
22729.4%
 
23384.9%
 
24466.0%
 
25486.2%
 
ValueCountFrequency (%) 
8110.1%
 
7210.1%
 
7010.1%
 
6920.3%
 
6810.1%
 

Outcome
Boolean

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.0 KiB
0
500 
1
268 
ValueCountFrequency (%) 
050065.1%
 
126834.9%
 
2021-12-04T17:22:42.628196image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Interactions

2021-12-04T17:22:27.425009image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:27.702953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:27.924063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:28.110473image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:28.293811image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:28.469340image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:28.655588image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:29.025745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:29.475789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:29.669270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:29.870732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:30.047137image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:30.220598image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:30.371303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:30.545019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:30.720696image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:30.908969image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:31.072904image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:31.237401image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:31.409902image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:31.589422image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:31.755011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:31.916637image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:32.100107image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:32.268989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:32.435942image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:32.594521image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:32.770049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:32.936571image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:33.093153image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:33.238762image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:33.384373image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:33.537527image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:33.722418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:33.904896image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:34.064646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:34.214280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:34.374816image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:34.523421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:34.662015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:34.825801image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:35.124946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:35.294492image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:35.473687image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:35.641204image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:35.790093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:35.951677image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:36.206979image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:36.377522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:36.570041image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:36.763491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:36.931624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:37.080190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:37.242759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:37.484112image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:37.754391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:37.953384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:38.135895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:38.316712image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:38.487255image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:38.643431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:38.792699image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:38.940397image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:39.105169image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-12-04T17:22:42.719988image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-12-04T17:22:42.967288image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-12-04T17:22:43.228590image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-12-04T17:22:43.464313image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-12-04T17:22:39.377582image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-12-04T17:22:39.656785image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
06148.072.00000035.00000079.79947933.6000000.627501
1185.066.00000029.00000079.79947926.6000000.351310
28183.064.00000020.53645879.79947923.3000000.672321
3189.066.00000023.00000094.00000028.1000000.167210
40137.040.00000035.000000168.00000043.1000002.288331
55116.074.00000020.53645879.79947925.6000000.201300
6378.050.00000032.00000088.00000031.0000000.248261
710115.069.10546920.53645879.79947935.3000000.134290
82197.070.00000045.000000543.00000030.5000000.158531
98125.096.00000020.53645879.79947931.9925780.232541

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
7581106.076.020.53645879.79947937.50.197260
7596190.092.020.53645879.79947935.50.278661
760288.058.026.00000016.00000028.40.766220
7619170.074.031.00000079.79947944.00.403431
762989.062.020.53645879.79947922.50.142330
76310101.076.048.000000180.00000032.90.171630
7642122.070.027.00000079.79947936.80.340270
7655121.072.023.000000112.00000026.20.245300
7661126.060.020.53645879.79947930.10.349471
767193.070.031.00000079.79947930.40.315230